Comparative Analysis of Codon Usage Bias in Six Eimeria Genomes

The codon usage bias (CUB) of genes encoded by different species’ genomes varies greatly. The analysis of codon usage patterns enriches our comprehension of genetic and evolutionary characteristics across diverse species. In this study, we performed a genome-wide analysis of CUB and its influencing factors in six sequenced Eimeria species that cause coccidiosis in poultry: Eimeria acervulina, Eimeria necatrix, Eimeria brunetti, Eimeria tenella, Eimeria praecox, and Eimeria maxima. The GC content of protein-coding genes varies between 52.67% and 58.24% among the six Eimeria species. The distribution trend of GC content at different codon positions follows GC1 > GC3 > GC2. Most high-frequency codons tend to end with C/G, except in E. maxima. Additionally, there is a positive correlation between GC3 content and GC3s/C3s, but a significantly negative correlation with A3s. Analysis of the ENC-Plot, neutrality plot, and PR2-bias plot suggests that selection pressure has a stronger influence than mutational pressure on CUB in the six Eimeria genomes. Finally, we identified from 11 to 15 optimal codons, with GCA, CAG, and AGC being the most commonly used optimal codons across these species. This study offers a thorough exploration of the relationships between CUB and selection pressures within the protein-coding genes of Eimeria species. Genetic evolution in these species appears to be influenced by mutations and selection pressures. Additionally, the findings shed light on unique characteristics and evolutionary traits specific to the six Eimeria species.


Introduction
In the amino acid composition of organisms, individual amino acids may correspond to multiple codons, a phenomenon known as codon degeneracy.Codons encoding the same amino acid are termed synonymous codons.Despite codon degeneracy allowing for multiple codons to encode a single amino acid, various organisms exhibit distinct preferences for specific synonymous codons.The codon serves as the fundamental unit of information for mRNA translation, with 62 codons encoding 20 distinct amino acids [1].However, across various genes or genomes, the selection of synonymous codons exhibits a non-random pattern, different organisms exhibit preferences for specific codons during amino acid encoding, reflecting a phenomenon known as codon usage bias (CUB) [2].Although synonymous mutations were traditionally viewed as "silent" mutations due to their lack of impact on protein sequences, research suggests that codon selection during evolution is not entirely neutral [3,4].For specific species, certain synonymous codons, termed optimised codons, are favoured, while others are used less frequently.Furthermore, codon usage patterns can impact various biological processes such as mRNA synthesis, the rate of translation elongation, protein folding, and other subsequent cellular functions [5][6][7].Specific synonymous substitutions have notable fitness and phenotypic effects across various organisms, including vertebrates and invertebrates [8].
It is widely recognised that CUB is mainly influenced by mutation pressure, natural selection, and random genetic drift [9,10].This preference is closely linked to GC content, gene expression level, gene length, tRNA abundance, protein structure, and RNA stability [11][12][13][14].For instance, highly expressed genes tend to favour codons that match abundant tRNAs, resulting in CUB.The development of CUB is influenced by the interplay between translational selection and mutation pressure [15,16].Analysing codon usage patterns can offer insights into the evolutionary and adaptive processes of different species, as codon usage may vary among species or even within the same species due to different evolutionary pressures [17].Additionally, investigating the CUB of pathogens can offer valuable insights into the regulation of pathogenic gene expression, thus contributing to the advancement of more effective vaccine strategies.
Chicken coccidiosis is a prevalent and severe parasitic disease among poultry [18].It manifests as an acute epidemic protozoal infection caused by one or more species of coccidia [19].This disease poses a significant threat to young chicks and is particularly prevalent among chickens aged 20-45 days [20].Its incidence is highest during seasons characterised by temperatures of 25-30 • C and heavy rainfall [21].The occurrence rate of coccidiosis can reach up to approximately 75%, with mortality rates ranging from 20 to 50% [22].Affected chicks experience stunted growth and slow weight gain upon recovery.While adult chickens typically remain asymptomatic, carriers may exhibit reduced weight gain and egg production, thereby serving as important vectors for coccidiosis [23].Chicken coccidiosis inflicts substantial economic losses annually on the global poultry industry [24,25].Chicken coccidiosis is an intestinal parasitic infection caused by one or more species of coccidia belonging to the phylum Apicomplexa, class Sporozoa, order Eucoccidiorida, family Eimeriidae, and genus Eimeria [26].Globally, seven species of chicken coccidia have been identified, including Eimeria acervulina, Eimeria necatrix, Eimeria brunetti, Eimeria tenella, Eimeria praecox, Eimeria maxima, and Eimeria mitis [27].These various species exhibit differing patterns of parasitism and pathogenicity.Notably, E. necatrix and E. tenella are among the most pathogenic species, with E. necatrix predominantly parasitising the mid-portion of the small intestine and E. tenella inhabiting the ceca [28].
At present, there are six complete genomes of the genus Eimeria that have been sequenced and reported.This study aims to analyse the genome-wide codon preferences of six Eimeria-encoded proteins using programming languages and bioinformatics tools.By comprehensively comparing codon usage patterns, this study seeks to enhance heterologous gene expression, improve resistance to coccidiosis, and facilitate the development of vaccines against parasitic diseases.Additionally, this research aims to lay the groundwork for functional genomics and phylogenetic studies in Eimeria, contributing to our understanding of gene origin, protein expression, and gene evolution processes.

Analysis of Nucleotide Composition and Codon Usage in Eimeria
The GC content of the protein-coding genes among the genome of six Eimeria species ranged from 52.67% to 58.24%, with E. maxima having the lowest value and E. necatrix having the highest.The coding sequences (CDSs) in the six Eimeria species exhibited a higher abundance of G and C nucleotides compared to A and T nucleotides.The average GC1 (GC content at the first position of codons) content exceeds that of both GC3 (GC content at the third position of codons) and GC2 (GC content at the second position of codons) in every species, with the distribution trend as GC1 > GC3 > GC2 (Table 1).The GC1 contents of six Eimeria species ranged from 61.99% to 66.09%, among which E. maxima had the lowest value and E. brunetti had the highest value.The GC3 contents of six Eimeria species ranged from 48.71% to 59.75%, among which E. maxima had the lowest value and E. tenella had the highest value.In terms of GC2, the GC2 contents ranged from 45.84% to 50.40% among six Eimeria species, while E. praecox had the lowest value and E. necatrix had the highest value.Comparable trends in nucleotide composition were noted in the third positions of synonymous codons, the GC3s (GC content at the third position of synonymous codons) content ranged from 47.43 (E.maxima) to 58.80 (E.necatrix), with the exception of E. maxima, the values for the other five Eimeria species exceeded 50%.GC: GC content of all codons; GC1: GC content at the first position of codons; GC2: GC content at the second position of codons; GC3: GC content at the third position of codons; GC3s: GC content at the third position of synonymous codons.ENC: effective number of codons.
The overall RSCU (relative synonymous codon usage) value of the six Eimeria genome was calculated (Figure 1).There are 26 codons with RSCU values greater than 1 in E. acervuline, E. necatrix, and E. tenella.Both E. brunetti and E. praecox contain 27 codons with RSCU values greater than 1.In addition, 28 codons with RSCU values greater than 1 were found in E. maxima.Among these high-frequency codons of Eimeria, most codons end with C/G.However, in E. maxima, codons tend to end with A/T more than C/G in high-frequency codons.This is related to the fact that only GC3 in E. maxima is below 50% among six Eimeria species.At the same time, the RSCU value of 31 codons is less than 1 in E. maxima, and the RSCU value of 32 codons is less than 1 in E. brunetti and E. praecox.E. acervuline, E. necatrix, and E. tenella have 33 low-frequency codons.Among these low-frequency codons, except in E. maxima, most of them tend to end with A/T.

Assessing the Correlation between Codon Usage Metrics
We observed a significant positive correlation between GC3 content and GC3s across six Eimeria species (p < 0.001).Additionally, there was a significant positive correlation between GC3 or GC3s and CBI (codon bias index) across these species.Furthermore, we noted a notable negative correlation between GC3 and A3s, but a significant positive correlation between GC3 and C3s in the same set of Eimeria species (p < 0.001).Moreover, CBI exhibited a significant positive correlation with FOP (frequency of optimal codons) across all Eimeria species (p < 0.001).Additionally, the L_sym (number of synonymous codons) index showed a significant positive correlation with the L_aa (length amino acids) index across all six Eimeria species (p < 0.001).The results indicate that the nucleotide composition can impact the CUB of genes in Eimeria species (Figure 2).

ENC-Plot Analysis
The average ENC values of the six Eimeria species ranged from 47.37 ± 8.70 to 51.93 ± 5.65, with E. praecox having the lowest value and E. acervulina having the highest.The average ENC values were 47.67 ± 7.89 for E. brunetti, 49.30 ± 7.28 for E. necatrix, 50.25 ± 6.97 for E. tenella, and 51.17 ± 6.97 for E. maxima, suggesting a general random codon usage pattern across the Eimeria genomes (Table 1).Among six Eimeria species, the ENC value of from 1.22% to 7.32% of the genes was less than 35, with E. brunetti having 7.32% of genes, and E. acervulina having 1.22% of genes, indicating that these genes within each species have a strong codon bias.
To assess the relationship between synonymous codon usage patterns and Enc across all genes within each Eimeria genome, the ENC-plot was constructed.The results showed that most genes in each species were located far below the expected ENC-plot curve, and only a small number of genes fell onto the expectation curve (Figure 3).This analysis revealed that the main factor affecting the CUB was selection pressure in six Eimeria species, at the same time, only a small number of coding genes are solely due to mutational pressure that leads to changes in codon usage.

Assessing the Correlation between Codon Usage Metrics
We observed a significant positive correlation between GC3 content and GC3s across six Eimeria species (p < 0.001).Additionally, there was a significant positive correlation between GC3 or GC3s and CBI (codon bias index) across these species.Furthermore, we noted a notable negative correlation between GC3 and A3s, but a significant positive correlation between GC3 and C3s in the same set of Eimeria species (p < 0.001).Moreover, CBI exhibited a significant positive correlation with FOP (frequency of optimal codons) across all Eimeria species (p < 0.001).Additionally, the L_sym (number of synonymous codons) index showed a significant positive correlation with the L_aa (length amino acids) index across all six Eimeria species (p < 0.001).The results indicate that the nucleotide composition can impact the CUB of genes in Eimeria species (Figure 2). a positive correlation, while dark red indicates a negative correlation.A higher value indicates a more significant correlation.Asterisks (*) denote statistically significant correlation alterations between the two indicators at a significance level of p < 0.05, and double asterisks (**) indicate significant correlations at the p < 0.001 level.The six Eimeria species, listed from left to right and from top to bottom, include E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.T3s, C3s, A3s, GC3s: compositions of third synonymous codons.CAI: codon adaptation index.CBI: codon bias index.Fop: frequency of optimal codons.Nc: effective number of codons.L_sym: number of synonymous codons.L_aa: length amino acids.Gravy: grand average of hydropathicity.Aromo: aromaticity.

ENC-Plot Analysis
The average ENC values of the six Eimeria species ranged from 47.37 ± 8.70 to 51.93 ± 5.65, with E. praecox having the lowest value and E. acervulina having the highest.The average ENC values were 47.67 ± 7.89 for E. brunetti, 49.30 ± 7.28 for E. necatrix, 50.25 ± 6.97 for E. tenella, and 51.17 ± 6.97 for E. maxima, suggesting a general random codon usage pattern across the Eimeria genomes (Table 1).Among six Eimeria species, the ENC value of from 1.22% to 7.32% of the genes was less than 35, with E. brunetti having 7.32% of genes, and E. acervulina having 1.22% of genes, indicating that these genes within each species have a strong codon bias.
To assess the relationship between synonymous codon usage patterns and Enc across all genes within each Eimeria genome, the ENC-plot was constructed.The results showed that most genes in each species were located far below the expected ENC-plot curve, and We also calculated each species' ENC frequency distribution in six Eimeria to test the discrepancy between observed ENC (ENC obs ) and expected ENC (ENC exp ) values.This analysis revealed that 70.51~82.20% of genes are distributed outside from −0.05 to 0.05, E. brunetti and E. praecox genomes had ratios greater than 80%, while the remaining four genomes had ratio values between 70.51% and 76.59%.These data show that the main factor affecting the codon usage in most protein-coding genes of six Eimeria species is natural selection pressure, and some genes are also affected by mutational bias, which further suggests that the formation of codon bias within these genomes were largely responsible for GC3s.only a small number of genes fell onto the expectation curve (Figure 3).This analysis revealed that the main factor affecting the CUB was selection pressure in six Eimeria species, at the same time, only a small number of coding genes are solely due to mutational pressure that leads to changes in codon usage.We also calculated each species' ENC frequency distribution in six Eimeria to test the discrepancy between observed ENC (ENCobs) and expected ENC (ENCexp) values.This analysis revealed that 70.51%~82.20% of genes are distributed outside from −0.05 to 0.05, E. brunetti and E. praecox genomes had ratios greater than 80%, while the remaining four genomes had ratio values between 70.51% and 76.59%.These data show that the main factor affecting the codon usage in most protein-coding genes of six Eimeria species is natural selection pressure, and some genes are also affected by mutational bias, which further suggests that the formation of codon bias within these genomes were largely responsible for GC3s.

PR2-Plot Analysis
The PR2-plot analysis was conducted to assess biases in the third codon position within four codon degenerate amino acids among protein-coding genes across six Eimeria species.According to Chargaff's second parity rule (PR2), the quantities of A = T and C = G in a DNA strand are equivalent [29].Each data point on the plot represents a gene, with the plot segmented into four quadrants.The centre of the plot, where both coordinates are 0.5, denotes the equilibrium point where A = T and G = C. Essentially, it signifies the absence of bias in selection or mutation forces within complementary DNA strands [30].
The results indicate that the majority of genes were distributed in the third quadrant among the six Eimeria species.The mean values of GC bias [G3/(G3 + C3)] ranged from 45.41 (E.praecox) to 47.23 (E.brunetti), and AT bias [A3/(A3 + T3)] ranged from 39.62 (E.tenella) to 49.17 (E.praecox), suggesting a pronounced preference for C over G and T over A at the third codon position (Figure 4).This implies a tendency for pyrimidine over purine usage in the third base of codons within Eimeria genomes.Therefore, the CUB of coding genes in the six Eimeria species is influenced not only by mutations but also significantly by other factors such as natural selection.

PR2-Plot Analysis
The PR2-plot analysis was conducted to assess biases in the third codon position within four codon degenerate amino acids among protein-coding genes across six Eimeria species.According to Chargaff's second parity rule (PR2), the quantities of A = T and C = G in a DNA strand are equivalent [29].Each data point on the plot represents a gene, with the plot segmented into four quadrants.The centre of the plot, where both coordinates are 0.5, denotes the equilibrium point where A = T and G = C. Essentially, it signifies the absence of bias in selection or mutation forces within complementary DNA strands [30].
The results indicate that the majority of genes were distributed in the third quadrant among the six Eimeria species.The mean values of GC bias [G3/(G3 + C3)] ranged from 45.41 (E.praecox) to 47.23 (E.brunetti), and AT bias [A3/(A3 + T3)] ranged from 39.62 (E.tenella) to 49.17 (E.praecox), suggesting a pronounced preference for C over G and T over A at the third codon position (Figure 4).This implies a tendency for pyrimidine over purine usage in the third base of codons within Eimeria genomes.Therefore, the CUB of coding genes in the six Eimeria species is influenced not only by mutations but also significantly by other factors such as natural selection.

Neutrality Plot Analysis
To provide additional insights into the impact of mutational pressure and natural selection of CUB on Eimeria genomes, we performed a neutrality plot analysis with GC12 and GC3 values in each gene.When nucleotide changes result in alterations to the encoded amino acid, it signifies the presence of selection pressure.Conversely, a correlation between GC12 and GC3 likely indicates the influence of mutational forces, as the force-shaping codon bias operates across all codon positions [31].If the slope of the regression line approaches 1, suggesting that genes are distributed predominantly along the diagonal, it means that CUB is only influenced by mutational pressure alone.As the slope gradually decreases or even diminishes to 0, the impact of natural selection on CUB progressively strengthens.The results reveal statistically significant negative correlations between GC12 and GC3 across all six Eimeria species (p < 0.0001), with r values ranging from −0.04994 (E.tenella) to −0.5918 (E.praecox), and slope values of the regression line ranging from 0.03292 (E.tenella) to 0.3487 (E.praecox).The lower slope values indicate that mutational pressure is not the predominant pressure, which means that, in E. praecox, the proportion of neutrality (mutation pressure) was 34.87%, while the proportion of constraint on GC3 (natural selection) was 65.13%, contrasting with 3.292% and 96.708%, respectively, in E. tenella (Figure 5).The neutrality plot revealed that selection pressure exerted a greater influence than mutational pressure for CUB in six Eimeria genomes.

Neutrality Plot Analysis
To provide additional insights into the impact of mutational pressure and natural selection of CUB on Eimeria genomes, we performed a neutrality plot analysis with GC12 and GC3 values in each gene.When nucleotide changes result in alterations to the encoded amino acid, it signifies the presence of selection pressure.Conversely, a correlation between GC12 and GC3 likely indicates the influence of mutational forces, as the force-shaping codon bias operates across all codon positions [31].If the slope of the regression line approaches 1, suggesting that genes are distributed predominantly along the diagonal, it means that CUB is only influenced by mutational pressure alone.As the slope gradually decreases or even diminishes to 0, the impact of natural selection on CUB progressively strengthens.The results reveal statistically significant negative correlations between GC12 and GC3 across all six Eimeria species (p < 0.0001), with r values ranging from −0.04994 (E.tenella) to −0.5918 (E.praecox), and slope values of the regression line ranging from 0.03292 (E.tenella) to 0.3487 (E.praecox).The lower slope values indicate that mutational pressure is not the predominant pressure, which means that, in E. praecox, the proportion of neutrality (mutation pressure) was 34.87%, while the proportion of constraint on GC3 (natural selection) was 65.13%, contrasting with 3.292% and 96.708%, respectively, in E. tenella (Figure 5).The neutrality plot revealed that selection pressure exerted a greater influence than mutational pressure for CUB in six Eimeria genomes.

Correspondence Analysis
A correspondence analysis (COA) was conducted using the RSCU values of genomewide protein-coding genes across six Eimeria species to examine codon biases.Axis 1, Axis 2, Axis 3, and Axis 4 accounted for 13.20%, 8.03%, 4.24%, and 3.75% of the average variation rates, respectively, with the first four axes collectively contributing to an average cumulative variation of 29.23%.Axis 1 emerged as the primary factor influencing CUB.Pearson correlation analysis demonstrated a significant relationship (p < 0.05) between the coordinate value of genes on the first axis and ENC, GC3s, GC3, and GC values across the six Eimeria species.To investigate the impact of GC content on CUB within these species, genes were colour-coded based on their GC content.The findings revealed a concentration of genes with GC content exceeding 60% or falling below 45% on the left or right side of the coordinate axis, whereas genes with GC content ranging between 45% and 60% were

Correspondence Analysis
A correspondence analysis (COA) was conducted using the RSCU values of genomewide protein-coding genes across six Eimeria species to examine codon biases.Axis 1, Axis 2, Axis 3, and Axis 4 accounted for 13.20%, 8.03%, 4.24%, and 3.75% of the average variation rates, respectively, with the first four axes collectively contributing to an average cumulative variation of 29.23%.Axis 1 emerged as the primary factor influencing CUB.Pearson correlation analysis demonstrated a significant relationship (p < 0.05) between the coordinate value of genes on the first axis and ENC, GC3s, GC3, and GC values across the six Eimeria species.To investigate the impact of GC content CUB within these species, genes were colour-coded based on their GC content.The findings revealed a concentration of genes GC content exceeding 60% or falling below 45% on the left or right side of the coordinate axis, whereas genes with GC content ranging between 45% and 60% were distributed on both sides of the axis (Figure 6).This observation underscores the influence of both selection pressure and gene mutation on CUB in Eimeria genomes.

Figure 6. Correspondence analysis (COA) utilising the relative synonymous codon usage (RSCU)
values obtained from protein-coding genes within six Eimeria species.In the graphical representation, red denotes genes with GC content falling below 45%, green represents genes with GC content ranging between 45% and 60%, and blue indicates genes with GC content exceeding 60%.The species included are labelled as follows: E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.values obtained from protein-coding genes within six Eimeria species.In the graphical representation, red denotes genes with GC content falling below 45%, green represents genes with GC content ranging between 45% and 60%, and blue indicates genes with GC content exceeding 60%.The species included are labelled as follows: E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Optimal Codon Analysis of Eimeria Genomes
The comparative analysis revealed that E. acervulina, E. necatrix, E. brunetti, E. tenella, E. praecox, and E. maxima had 14, 13, 13, 15, 11, and 11 optimal codons (∆RSCU > 0.08 and RSCU > 1), respectively.The majority of optimal codons in Eimeria species end with C or G, except for E. praecox and E. maxima.Among these, GCA, CAG, and AGC are the most commonly used optimal codons favoured in six Eimeria species.Following closely are CAC, CUG, AAC, and ACA, which are preferred in five Eimeria species.UGC is the top optimal codon in four Eimeria species.GAC, GGA, CCA, CGC, UCU, and GUG serve as optimal codons in three out of six Eimeria species.GGC, AAG, UCA, UAC serve as optimal codons in two Eimeria species.Additionally, individual Eimeria species exclusively favour GAA, UUU, AGG, CGG, and GUU as their optimal codons (Figure 7, Table S2).

Comparative Analysis of Codon Usage between Eimeria and Other Organisms
Utilising the Codon Usage Database, we conducted a comparative analysis of codon usage frequencies between E. tenella and various other species to ascertain similarities in codon usage preferences.We specifically examined codons in E. tenella (ET) that displayed frequency ratios ≥2 or ≤0.5 when compared with those in Gallus gallus (GG), Toxoplasma gondii (TG), Plasmodium vivax (PV), Cryptosporidium parvum (CP), Entamoeba histolytica (EH), Mus musculus (MM), and Homo sapiens (HS).For each species pair, we identified 2, 5, 18, 35, 43, 4, and 3 such codons, respectively.A lower count of codons suggests a smaller disparity in synonymous CUB between the two species.Consequently, E. tenella displays closer alignment in codon usage preferences with Gallus gallus, Toxoplasma gondii, Mus musculus, and Homo sapiens, while notable discrepancies are observed with Plasmodium vivax, Cryptosporidium parvum, and Entamoeba histolytica (Table 2).The identical analysis performed on E. acervulina, E. necatrix, E. brunetti, E. praecox, and E. maxima produced findings analogous to those observed in E. tenella.
E. praecox, and E. maxima had 14, 13, 13, 15, 11, and 11 optimal codons (ΔRSCU > 0.08 and RSCU > 1), respectively.The majority of optimal codons in Eimeria species end with C or G, except for E. praecox and E. maxima.Among these, GCA, CAG, and AGC are the most commonly used optimal codons favoured in six Eimeria species.Following closely are CAC, CUG, AAC, and ACA, which are preferred in five Eimeria species.UGC is the top optimal codon in four Eimeria species.GAC, GGA, CCA, CGC, UCU, and GUG serve as optimal codons in three out of six Eimeria species.GGC, AAG, UCA, and UAC serve as optimal codons in two Eimeria species.Additionally, individual Eimeria species exclusively favour GAA, UUU, AGG, CGG, and GUU as their optimal codons (Figure 7, Table S2).

Comparative Analysis of Codon Usage between Eimeria and Other Organisms
Utilising the Codon Usage Database, we conducted a comparative analysis of codon usage frequencies between E. tenella and various other species to ascertain similarities in codon usage preferences.We specifically examined codons in E. tenella (ET) that displayed frequency ratios ≥2 or ≤0.5 when compared with those in Gallus gallus (GG), Toxoplasma gondii (TG), Plasmodium vivax (PV), Cryptosporidium parvum (CP), Entamoeba histolytica (EH), Mus musculus (MM), and Homo sapiens (HS).For each species pair, we identified 2, 5, 18, 35, 43, 4, and 3 such codons, respectively.A lower count of codons suggests a smaller disparity in synonymous CUB between the two species.Consequently, E. tenella displays closer alignment in codon usage preferences with Gallus gallus, Toxoplasma gondii, Mus musculus, and Homo sapiens, while notable discrepancies are observed with Plasmodium vivax, Cryptosporidium parvum, and Entamoeba histolytica (Table 2).The identical analysis performed on E. acervulina, E. necatrix, E. brunetti, E. praecox, and E. maxima produced findings analogous to those observed in E. tenella.

Discussion
Numerous studies have demonstrated that the CUB is influenced by a complex interplay between mutational processes and selective pressures throughout the evolutionary history of organisms [32].Codon selection plays a crucial role in regulating gene expression, as optimal codons can enhance both the efficiency and accuracy of translation [33].A plethora of biochemical, genetic, biophysical, and bioinformatics investigations have demonstrated that codon preference impacts various gene regulatory mechanisms, encompassing protein translation, co-translational folding, transcription, and post-transcriptional regulatory processes [34].Moreover, codon usage profoundly influences gene expression and protein functionality across a spectrum of organisms, spanning from lower to higher organisms.In the context of expressing heterologous genes, optimising the codons of the target gene to match the preferred codons of the host species can significantly improve gene expression efficiency [35].Several studies have highlighted the significant impact of base composition on codon preference [36].Furthermore, the preferences for the utilisation of bases, codons, and amino acids are also influenced by factors such as gene expression levels, gene functions, and the evolutionary development of the species [37].
The current investigation focuses on elucidating CUB within the genomes of six Eimeria species.Remarkably, the genomes of Eimeria species are GC-rich, with these six species displaying a relatively elevated GC content in their genomes.Across all six genomes, the GC3 content ranges from 48.71% to 59.75%, surpassing that of GC2, with the distribution pattern being GC1 > GC3 > GC2.This investigation demonstrates an unbalanced utilisation of GC and AT bases at the third codon position in Eimeria genomes.In four-codon amino acids, G/C-ending codons appear to be preferred over A/T-ending codons.Additionally, codons ending in C are favoured over those ending in G, and codons ending in T are favoured over those ending in A, demonstrating a preference for pyrimidine bases in the third position.These observations suggest an influence of GC3 bias on codon usage patterns.The human genome exhibits a preference for G or C, particularly in synonymous codons terminating with C [38].This preference for C or G in the third codon position is also observed in species such as Caenorhabditis elegans, Daphnia pulex, and Drosophila melanogaster [39].Conversely, species like Borrelia burgdorferi, Mycoplasma capricolum, Onchocerca volvulus, and Plasmodium falciparum exhibit a preference for A or T in the third codon position [40][41][42].Studies in mammals have revealed that genes with higher GC content tend to exhibit elevated expression levels compared to those with lower GC content [43], warranting further investigation into the potential correlation between Eimeria gene expression levels and GC content.
The CDSs in the six Eimeria species displayed an average ENC ranging from 47.37 ± 8.70 to 51.93 ± 5.65.If codon usage is solely influenced by GC3 content, it suggests the presence of mutational pressure.In such instances, the ENC values tend to be slightly higher than the expected ENC curve [44].In all six Eimeria genomes, most CDSs showed ENC values below the expected curve, with from 1.22% to 7.32% of their genes exhibiting an ENC below 35, suggesting a significant preference for specific codons and highlighting the dominant influence of natural selection pressure.Following this, we have also explored the influence of natural selection pressure through the neutrality plot analysis.Mutational pressure is inferred as the primary determinant of CUB when the gradient of the regression line approaches 1 and the correlation between GC12 and GC3 achieves statistical significance.Conversely, gradients nearing 0 or displaying a nearly horizontal trajectory suggest that natural selection pressure predominantly shapes CUB.In the neutrality plot analysis, with the slope of the regression line ranging from 0.03292 to 0.3487 in six Eimeria species, most genes tended to diverge considerably from the slope of the regression line, further confirming the dominance of natural selection over mutational forces.Both the neutrality plot and PR2-plot analyses provided compelling evidence supporting the involvement of natural selection in codon bias within Eimeria.This study's findings underscore that despite variations in codon usage indicators within Eimeria species, the CUB of proteincoding genes observed in these six Eimeria species is influenced by both natural selection pressures and mutational processes.Notably, all six Eimeria species experienced robust natural selection pressures on their protein-coding genes, particularly when considering the base composition.Moreover, no correlation was observed between ENC and GRAVY or AROMO, indicating no influence of hydrophobicity or aromaticity on CUB.Similarly, no significant associations were found between CAI, GRAVY, and AROMO, suggesting minimal impact of these factors on gene expression.
This study analysed the CUB of six Eimeria species at the whole-genome level.The results indicate that there are 26-31 preferred codons in these genomes, with 11-15 of them being optimal codons.The CUB of six Eimeria species are similar to those of Gallus gallus, Toxoplasma gondii, Mus musculus, and Homo sapiens, but markedly different from those of Plasmodium vivax, Cryptosporidium parvum, and Entamoeba histolytica.These findings suggest a potential co-evolutionary relationship between Eimeria and host genomes, all six Eimeria species are well adapted to Gallus gallus.Phylogenetic analysis reveals that Eimeria is more closely related to T. gondii, implying that species with closer phylogenetic relationships tend to share more similar CUB.Coccidiosis has emerged as a significant health threat in poultry, necessitating urgent efforts toward vaccine development and therapeutic discovery.Eimeria species displayed significant adaptation to sequences from both Mus musculus and Homo sapiens, as evidenced by CUB.Therefore, cell lines derived from bats and humans may provide robust support for Eimeria gene replication.findings provide valuable insights for selecting optimal experimental cell lines for vaccine development, heterologous gene expression studies, and research related to pathogenicity.Identifying distinct codon patterns is essential for understanding gene expression and evolutionary impacts on the genome.It also aids in phylogenetic analysis and optimising gene expression through codon optimisation.Numerous studies emphasise the association between CUB and gene expression levels, impacting translation efficiency throughout the proteome [45].Translational selection affects both codon and amino acid usage, with highly expressed genes favouring amino acids with low or intermediate size/complexity scores, such as alanine and glycine, and disfavouring those with high scores, such as cysteine [46].
Different organisms use varying frequencies of different codons to encode the same amino acid.To enhance the expression of exogenous proteins, optimising inserted foreign gene sequences based on the codon bias of the target organism is essential.This optimisation primarily aims to reduce rare codon usage, thereby increasing transcription speed and lowering error rates [47].Moreover, optimised genes often feature higher guanine and cytosine nucleotide content, which enhances mRNA stability and potentially improves mRNA transport efficiency from the nucleus to the cytoplasm, thus boosting exogenous protein expression.Advances in biotechnology have facilitated the prediction of optimised gene sequences based on the target protein's sequence, followed by the synthesis of these optimised foreign genes using artificial methods.In vaccine development, codon optimisation has proven effective in enhancing antigen protein expression levels and is widely applied with successful outcomes [48].Regarding the adaptation of Eimeria species to host species, they typically rely on the host cell's gene expression machinery to synthesise their own proteins.This includes utilising the host's tRNA molecules for translation.Eimeria species have evolved mechanisms to interact with and manipulate host cell processes, allowing them to exploit host resources for their own replication and survival within the host's intestinal cells during infection [49].This adaptation is critical for their lifecycle and pathogenicity in poultry.Future research should investigate correlations among codon usage, amino acid frequency, and expression levels.Furthermore, our findings offer theoretical guidance for the functional genomic study of Eimeria genomes and the vaccine strategy of coccidiosis in poultry.

Genomic Data
The genomic data and annotations for E. acervulina, E. necatrix, E. brunetti, E. tenella, E. praecox, and E. maxima were retrieved from the NCBI genome database (https://www.ncbi.nlm.nih.gov/genome/,accessed on 1 June 2024) (Table S1).A customised Python script was utilised to filter genes based on specific criteria pertaining to CDS: sequences exceeding 300 base pairs in length, with the number of bases being a multiple of three, and containing complete start and stop codons.Subsequently, a total of 5717 coding genes for E. acervulina, 7222 for E. necatrix, 8006 for E. brunetti, 6519 for E. tenella, 7096 for E. praecox, and 5205 for E. maxima were identified and retained in the filtered sequence files, respectively.

Calculation of Codon Related Parameters
Among common codons, ATG and TGG encode only one amino acid, while TAA, TAG, and TGA function as stop codons.These five codons were excluded, and subsequent bioinformatics analysis focused on the remaining codons.The effective number of codons (ENC) reflects the codon diversity within the gene.An ENC value of 20 suggests one codon per amino acid, while 61 indicates the average usage of each codon.The nucleotide composition of CDSs was assessed, focusing specifically on various GC-related metrics.GC signifies the total count of guanine (G) and cytosine (C) nucleotides in each gene, while GC1, GC2, and GC3 represent the counts of G and C nucleotides at the first, second, and third positions of each codon in the gene, respectively.Additionally, GC12 denotes the average count of G and C nucleotides at the first and second positions of codons.Furthermore, T3s, C3s, A3s, and G3s denote the frequencies of thymine (T), cytosine (C), adenine (A), and guanine (G) nucleotides, respectively, at the third position of codons within CDSs.Lastly, GC3s represent the GC content specifically at the third position of synonymous codons.General average hydropathicity (GRAVY) values, ranging from −2 to 2, are obtained by summing the hydropathy values of amino acids in polymerase gene sequences and multiplying by the number of residues.Positive and negative GRAVY values represent hydrophobic and hydrophilic proteins, respectively.The aromaticity (AROMO) value reflects the frequency of aromatic amino acids (Phe, Tyr, and Trp).GRAVY and AROMO values serve as indicators of amino acid usage, and changes in amino acid composition can impact codon usage analysis results.Relative synonymous codon usage (RSCU) denotes the ratio of the observed frequency of codons to the expected frequency, assuming equal usage of all synonymous codons for the same amino acids.RSCU values greater than 1 signify positive codon bias, values less than 1 indicate negative bias, and values equal to 1 denote random codon usage.DAMBE 7.3.11software [50], CodonW 1.4.2software [51], and custom Biopython scripts were utilised for analysing the aforementioned parameters.

ENC-Plot Analysis
The ENC-Plot is commonly utilised to assess whether codon usage in a particular gene is influenced solely by mutation or by other factors.The ENC-Plot was generated using R programming language, with ENC values plotted on the ordinate and GC3s values on the abscissa.The expected curve of ENC is calculated by the formula: ENC exp = 2 + GC3s + 29/[GC3s 2 + (1 − GC3s) 2 ] [52].When the data points cluster around this expected curve, it suggests that mutation pressure independently contributes to codon bias formation.Conversely, if data points deviate significantly from the expected curve, it indicates the involvement of other factors, such as natural selection, in shaping codon bias.Furthermore, we evaluated the discrepancies between the expected and actual ENC values using the ENC ratio index by the formula: ENC ratio = (ENC exp − ENC obs )/ENC.The ENC ratio value quantifies the extent of variation between the expected and observed ENC values.

PR2-Plot Analysis
The PR2-plot was generated using the R programming language to analyse the proportional relationship between purine and pyrimidine at the third base of each four-codon degenerate amino acid.The four-codon degenerate amino acids are alanine (GCT, GCG, GCC, GCA), glycine (GGT, GGG, GGC, GGA), proline (CCA, CCC, CCT, CCG), threonine (ACC, ACA, ACG, ACT), valine (GTT, GTG, GTC, GTA), leucine (CTA, CTC, CTG, CTT), serine (TCA, TCC, TCG, TCU), and arginine (CGA, CGC, CGG, CGU).We employed A3/(A3 + T3) as the vertical axis and G3/(G3 + C3) as the horizontal axis, where A3, T3, G3, and C3 denote the content of A, T, G, and C in the third codon position, respectively.These values for the four-codon degenerate amino acids were computed using a custom Biopython script.The vectors extending from the centre point to other points delineate the preferred orientation and strength of purine or pyrimidine bias on the third base of codons.

Neutrality Plot Analysis
The neutrality plot is primarily employed to examine the relationship between GC12 and GC3.In this study, the neutrality plot was constructed using the R programming language to assess the intricate interplay between mutational pressure and natural selection in shaping CUB within genes in six Eimeria species.GC3 values were plotted on the abscissa, while GC12 values were plotted on the ordinate.A regression line through the plotting of GC3s against GC12s was applied to fit the plot.The significance of the correlation observed in relation to the slope of the regression line indicates the impact of mutational forces on the overall outcome [53].

Analysis of Optimal Codons in Eimeria Genomes
The ENC values of CDS from six Eimeria species were computed, respectively.Subsequently, CDS sequences from each species were sorted based on their ENC values, with the lowest and highest 10% identified and separated to construct high-and low-expression libraries, respectively.Specifically, sequences with low ENC values were categorised as high-expression libraries, while those with high ENC values formed the low-expression libraries.Following this, the RSCU and ∆RSCU values were calculated for each group.Codons with RSCU values exceeding 1 were considered high-frequency codons, whereas those with ∆RSCU values exceeding 0.08 were deemed high-expression codons.Codons meeting both criteria were designated as optimal codons.

Figure 2 .
Figure 2. Correlation analysis among different indices across six Eimeria species.Dark blue indicates a positive correlation, while dark red indicates a negative correlation.A higher value indicates a more significant correlation.Asterisks (*) denote statistically significant correlation alterations between the two indicators at a significance level of p < 0.05, and double asterisks (**) indicate significant correlations at the p < 0.001 level.The six Eimeria species, listed from left to right and from top to bottom, include E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecoxand E. maxima.T3s, C3s, A3s, GC3s: compositions of third synonymous codons.CAI: codon adaptation index.CBI: codon bias index.Fop: frequency of optimal codons.Nc: effective number of codons.L_sym: number of synonymous codons.L_aa: length amino acids.Gravy: grand average of hydropathicity.Aromo: aromaticity.

Figure 3 .
Figure 3.The ENC-plot analysis was conducted on protein-coding genes across six Eimeria species.The red solid line in the plot represents the expected curve under the assumption that codon usage bias is solely influenced by mutation pressure.The species analysed include E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Figure 3 .
Figure3.The ENC-plot analysis was conducted on protein-coding genes across six Eimeria species.The red solid line in the plot represents the expected curve under the assumption that codon usage bias is solely influenced by mutation pressure.The species analysed include E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Figure 5 .
Figure 5.The neutrality plot analysis was conducted on GC12 and GC3 for the protein-coding genes across six Eimeria species.The species analysed are as follows: E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Figure 5 .
Figure 5.The neutrality plot analysis was conducted on GC12 and GC3 for the protein-coding genes across six Eimeria species.The species analysed are as follows: E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Figure 6 .
Figure 6.Correspondence analysis (COA) utilising the relative synonymous codon usage (RSCU)values obtained from protein-coding genes within six Eimeria species.In the graphical representation, red denotes genes with GC content falling below 45%, green represents genes with GC content ranging between 45% and 60%, and blue indicates genes with GC content exceeding 60%.The species included are labelled as follows: E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.

Figure 7 .
Figure 7.In the examination of optimal codon usage in six Eimeria species, codons meeting the criteria of ΔRSCU > 0.08 and RSCU > 1 are indicated in a single asterisk.Codons exhibiting high ΔRSCU are highlighted in yellow, while codons with low ΔRSCU are denoted in purple.

Figure 7 .
Figure 7.In the examination of optimal codon usage in six Eimeria species, codons meeting the criteria of ∆RSCU > 0.08 and RSCU > 1 are indicated in a single asterisk.Codons exhibiting high ∆RSCU are highlighted in yellow, while codons with low ∆RSCU are denoted in purple.

Table 1 .
Average GC content and ENC values of protein-coding genes in six Eimeria species.

GC GC1 GC2 GC3 T3s C3s A3s G3s GC3s CAI CBI Fop Nc L_sym L_aa Gravy Aromo Eimeria tenella
*Figure2.Correlation analysis among different indices across six Eimeria species.Dark blue indicates a positive correlation, while dark red indicates a negative correlation.A higher value indicates a more significant correlation.Asterisks (*) denote statistically significant correlation alterations between the two indicators at a significance level of p < 0.05, and double asterisks (**) indicate significant correlations at the p < 0.001 level.The six Eimeria species, listed from left to right and from top to bottom, include E. acervulina; E. necatrix; E. brunetti; E. tenella; E. praecox and E. maxima.T3s, C3s, A3s, GC3s: compositions of third synonymous codons.CAI: codon adaptation index.CBI: codon bias index.Fop: frequency of optimal codons.Nc: effective number of codons.L_sym: number of synonymous codons.L_aa: length amino acids.Gravy: grand average of hydropathicity.Aromo: aromaticity.

Table 2 .
Comparison of synonymous codon usage between E. tenella and other species.
ET GG TG PV CP EH MM HS ET/GG ET/TG ET/PV ET/CP ET/EH ET/MM ET/HS